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ABSTRACT 


Decentralized  estimation  problems  involve  several  agents  receiving 
separate  noisy  observations  of  a  common  stochastic  process,  and  each  seeks 
to  generate  a  local  estimate  of  the  state  of  that  process.  In  the  general 
case,  these  estimates  are  desired  to  be  consistant  in  some  way,  and  thus 
may  be  jointly  penalized  with  the  state  via  a  cost  functional  to  be 
minimized.  In  many  cases,  each  agent  need  only  keep  track  of  its  local 
conditional  state  probability  distribution  in  order  to  general  the  optimal 
estimates.  This  paper  examines  the  boundary  between  problems  where  this 
statistic  is  sufficient  and  those  where  it  is  not;  when  it  is  not,  the 
additional  information  which  must  be  kept  appears  to  have  additional 
structure  as  illustrated  by  an  example. 


This  work  was  supported  by  ONR  contract  N00014-77-0532C  (N041-519) . 
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I .  Introduction 


Many  engineering  problems  involve  a  system  evolving  under  the  influence 
of  randan  events,  and  from  which  information  can  be  collected  by  a  number 
of  noisy  sensors.  If  one  can  combine  the  information  received  by  the 
sensors,  then  the  problem  of  determining  the  state  of  the  system  is  one 
of  classical  estimation  and  filtering  theory  [1].  Often,  however,  the 
sensors  are  physically  dispersed,  and  communication  resources  are  scarce, 
absent,  or  characterized  by  nonnegligible  delay,  so  that  the  problem  takes 
on  a  more  complicated  structure.  The  possibility  of  reverting  to  distributed 
information  processing  must  be  considered  in  these  cases,  using  a  scheme  in 
which  estimates  are  computed  local  to  each  sensor  site  in  support  of 
decisions  to  be  made  at  that  site.  In  such  cases,  one  is  concerned  with 
two  issues:  whether  or  not  the  local  estimates  are  accurate  in  their  re¬ 
lationship  to  the  underlying  state  and  whether  or  not  they  lead  to  con¬ 
sist  ant  decisions  despite  inaccuracies. 

Such  problems  fall  into  the  class  of  team  theoretic  optimization,  where 
the  local  sensor  sites  are  viewed  as  separate  decision  agents  acting  to 
achieve  some  common  objective.  One  of  several  interesting  problems  arises 
when  any  feedback  of  the  local  decisions  to  the  system  is  ignored-i.e. , 
the  problem  is  one  of  producing  estimates  of  the  system  behavior,  not 
controlling  it.  Applications  which  exhibit  this  characteristic  include 
surveillance  [2],  air  traffic  control,  and  multiplatform  navigation 
[ 3} .  The  theory  which  applies  to  this  subclass  of  problems  is  that  of 
[4,5],  since  the  lack  of  feedback  and  communication  (unlike  [6,17]) 
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implies  a  partially  nested  (PN)  information  structure.  The  general 


approach  taken  to  a  PN  problem  is  to  reduce  it  to  an  equivalent  static 
problem,  and  this  route  will  be  often  be  followed  here. 

If  both  direct  feedback  and  communication  are  prohibited,  the  only 
interesting  qualitative  issue  left  is  that  of  second-guessing,  where  each 
agent  considers  the  errors  others  are  likely  to  make  (inferred  through  the 
relationship  of  the  others'  observations  to  its  own  information)  and  adjusts 
its  estimate  to  be  consistant  with  others.  In  fact,  the  need  for  mutually 
consistant  estimates  (decisions)  and  the  resulting  information  retention 
requirements  of  the  agents  is  the  major  intellectual  motivation  for  this 
paper . 

Thus  this  work  addresses  seme  important  applications  problems,  but  also 
provides  a  stepping-stone  to  an  understanding  of  more  complex  structures. 

The  principal  question  answered  is  "when  is  the  local  conditional  probability 
distribution  enough,  when  is  it  not  enough,  and  what  more  is  needed  in  the 
latter  case?"  The  contributions  are  a  unified  treatment  of  the  decentralized 
estimation  problem,  same  new  (and  simpler)  proofs  and  interpretations 
of  existing  results,  but  more  importantly  an  example  of  what  may  replace 
the  local  state  distribution  in  general  dynamical  {roblems. 

Subsequent  sections  specify  the  problem  formulation,  establish  nota¬ 
tion,  point  out  why  the  decentralized  estimation  problem  becomes  trivial 
if  there  is  not  a  need  for  interestimate  consistancy,  and  then  treat 
the  problem  in  increasing  steps  of  complexity.  First,  the  static  problem  is 
reviewed,  then  the  sequential  problem  (static  system  state,  but  sequential 
observations  which  indeed  may  depend  upon  an  agent's  past  decisions), 
and  finally  the  general  dynamic  case,  where  the  state  may  evolve  randomly 
in  ti««?.  it  is  in  the  last  case  where  the  sufficient  statistics  start  to 
get  interesting,  although  at  least  one  special  case  exists. 


II .  Problem  Statement 

The  specific  problem  addressed  is  described  here.  The  general  setting 
is  one  where  the  state  x  of  a  dynamic  system  evolves  under  the  influence 
of  a  white  noise  process  w.  Two  agents  (generalization  to  more  is 
straightforward)  observe  signals  y^  which  depend  only  on  x  ,  a  local,  independent 
white  noise  process  v^,  and  a  local  state  Each  generates  a  decision  via  a 
decision  rule  y ^  which  is  restricted  to  be  a  function  of  only  the  past 
observations  and  decisions  of  that  agent.  These  decisions  may  affect  a 
local  dynamic  system  (local  in  that  its  state  x^  depends  only  on  itself, 
a  local  white  noise  process  w^,  and  the  local  decision  u^) ,  permitting 
the  application  of  these  results  to  decentralized  optimal  stopping  and 
search  problems  (Figure  1)  .  The  agents  seek  to  minimize  the  ejected 
value  of  a  cost  function  J  which  is  additively  separable  in  time.  We 
seek  to  find  statistics  z z2  and  equations  determining  their  behavior 

/n  A 

such  that  there  exists  a  pair  of  decision  rules  y^,  y2  with  only  z^  (  or 

z2>  as  argunents,  and  which  performs  as  well  as  the  best  decision  rule 

which  uses  all  past  information.  (If  the  z^  lie  in  a  finite  dimensional  space, 

A 

the  possible  y^  may  often  be  characterized  by  a  finite  mxnber  of  parameters, 
and  the  original  problem  reduced  to  one  of  parametric  optimization.) 

The  notation  is  chosen  to  facilitate  the  use  of  various  independence 
assumptions  available.  Subscripts  denote  the  agent  with  which  a  variable 
is  associated.  Upper  case  letters  are  used  to  denote  sequences,  e.g. 

Xi(s:t)-(xi(s) . xi(t))  (2.1) 

The  joint  obervation  and  decision  are  denoted  by 

y (t)  -  (y1(t),  y2 (t) )  u(t)  =  (^  (t)  ,  u2(t))  (2.2) 

The  structural  assumptions  made  are  stated  formally  as: 


System 

x(t) 


Local 

System 

x2(t) 


Al.  Open  Loop,  Markov  System: 


p(x(t+l)  |x(t) ,  X^t)  ,  x2<t),  W(t),  wL(t) ,  w2(t),  U(t)  , 

Vx<t)  ,  V2(t)  ,  Y(t)  ,  t)  =  (2.3) 

p(x(t+l)  |x  (t)  ,  w(t),  t) 

A2.  Markov  Local  Systems: 

pfx^t+l)  Ix^t)  ,  vrft),  u^(t),t)  completely  describes  the  evolution 
of  x^(t+l) ,  as  in  Al. 

A3.  White  Driving  Hoises; 

w(t)  ,  (t)  ,  and  %*2  (t)  are  each  independent  of  all  prior  random 

variables. 

A4.  White  Observation  Noises: 

v  (t)  and  v  (t)  are  each  independent  of  all  prior  random  variables. 
1  2 

Also,  y^(t)  is  conditionally  independent  of  all  prior  random  variables 
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except  v^(t),  x(t)  ,  and  xi (t) . 

A5.  Spatial  Independence: 

w(t)  ,  w(t)  ,  w2(t)  are  jointly  independent;  v^t)  and  v2(t)  are 
jointly  independent. 


4.u^(t-l)  may  be  included  as  part  of  xi (t) . 


-6- 


A6.  Additivity  of  Objective: 


The  cost  functional  J  depends  only  on  X,  U,  and  X2 ,  and  is  additively 


separable : 


J  (X,  U,  X  ,  X  )  =  l  J (x  (t)  ,  u_(t),  u.(t),  x. (t),  x_  (t)  ,  t)  (2.4) 

!  2  t=i  1  Q 

Of  these,  Al  and  A2  simply  pose  the  problem  in  state  space  form,  and 
preclude  feedback  of  actions  from  local  systems  to  the  original  system, 
as  well  as  communication  between  the  local  systems.  A3  and  A4  may  be 
relaxed;  if  colored  driving  or  observation  noise  is  present,  state 
augmentation  can  be  used  to  reformulate  this  problem  in  this  framework. 

A6  is  the  usual  assumption  which  permits  dynamic  programming  approaches 
to  succeed;  if  the  cost  is  not  additively  separable  in  time,  then  often 
the  state  space  can  be  augmented  to  make  it  so  (and  this  is  one  major 
motivation  for  the  local  dynamic  models  here,  so  that  the  optimal  stopping 
problem  can  be  placed  in  the  present  framework.)  However,  A5  may  be  of  some 
concern  (71,  so  it  is  worth  pointing  out  that  correlated  observation  noise 
can  be  treated  here. 


Lemma  1:  A  problem  with 


p{y1<t),  y2  (t)  |  x  (t) )  4  ptyj^t)  |x(t)  )p(y2  (t)  |x(t)) 


(2.5) 


can  be  reduced  to  a  form  satisfying  A5. 


Proof :  Find  some  statistic  z(t)  such  that 


pty^t),  y2 (t)  | x (t)  ,z (t) )  =  pty^t)  |x(t),z(t))p(y2(t)  |x(t)  ,z(t)) 


(2.6) 


and  augment  the  state  so  that  x' (t)  =  (x(t),z(t)).  Thus  (2.6)  implies 


the  independence  of  and  when  conditioned  on  x' (t) .  Such  a  z(t) 
exists:  z (t)  =  y(t)  always  works,  although  statistics  of  lower  dimension 
may  also  exist.  ° 

The  above  formulation  is  a  bit  redundant,  as  the  probabilistic 
representation  of  state  transitions  and  observation  probabilities 
obviate  the  need  to  explicitly  consider  the  w's  and  v's.  However, 
this  is  the  formulation  most  convenient  for  the  derivations  which  follow. 

The  redundancy  is  reduced  by  assuming  that  the  w's  and  v's  are  the  only 
primitive  sources  of  randomness,  and  the  above  state  transition  and 
observation  distributions  are  probabilistic  representations  of  deterministic 
functions.  For  example5, 

x(t+l)  =  f  (x(t)  ,w(t)  ,t)  <™> 

p(x(t+l)  |  x(t)  ,w(t)  ,t)  =  6(x(t+l);  f  (x(t)  ,  w(t)  ,t) ) 

Also,  since  the  general  time  varying  case  is  being  considered,  let  the 
first  decision  be  made  at  t=l  so  that  w(0)  can  represent  initial  conditions 
on  the  state  (and  x(0)  assumed  fixed  and  known). 

In  summary,  the  quantities  needed  to  specify  a  problem  of  this  type 

are: 

State  Dynamics: 

p(x(t+l)  |  x(t)  ,  w(t)  ,t) 

pfx^  (t+1)  |  (t)  ,  w^t),  uMtJ.t)  i*l  ,2 

5The  5  is  cither  Divac  or  Kronecker,  depending  on  the  structure  of  the  set 
in  which  x(t+l)  resides. 


Driving  noise  statistics; 

p(w(0))  (initial  conditions) 

p(w(t) ) 

p(vi\  (0))  (initial  conditions)  i=l,2 

P(wi(t))  i=l,2 

Sensor  model: 

p(yi  (t)  |  x(t)  ,  x^t),  v^t),  t)  i=l  ,2 

p(vi(t)).  i«l,2 


Cost; 

J(x(t),  u1(t),  U2(t),  x^(t)  ,  x2  (t)  ,  t) 


The  overall  objective  of  the  problem  is  to  choose  the  sequences  of  decis  on 


rules  f  =  {yi  ( .  ,t.)  ,  t=l, .  .  •  ,t}  which  are  functions  of  the  local  informa¬ 


tion  I^(t)  (note  the  assumption  of  perfect  local  state  information) 


Ii('t)  =(Yi(t)  ,  Oi(t-l)  ,  Xi(t))»  Ii(t-1) 
and  which  minimize 


(2.7) 


J(rV  r2> 


=  E  {j(X,U,X. ,X_) } 
W  V,  1 


(2.8) 


'1 

W1  V2 


W. 


Since  1^ (t)  constantly  grows  in  dimension,  we  seek  a  smaller  but  sufficient 


summary  of  I^(t)  as  a  first  step  in  the  solution  process. 


“strictly,  these  must  be  measurable  functions  of  Ii(t)so  that  the  ex¬ 
pectation  in  (2.8)  is  well  defined.  This  and  other  technical  assumptions 
required  for  random  variables  to  be  well  defined  will  be  made  implicitly. 


■V  -.-'J 


It  is  important  that  J  jointly  penalize  the  decisions  in  order  to 
require  coordination;  otherwise  the  problem  becomes  much  easier. 

Lemma  2:  If 

J(x(t),  u^t),  ui  (t)  #  x^t),  x2lt\t)  =  (2.9) 

JQ(x(t),fc)  +  J^xlt),  u^t),  x  (t),t)  +  J2(x(t),  u2(t)  ,x2(t)  ,t) 

then  each  agent  optimizes  separately,  independent  of  the  structure  of 
the  system  pertaining  to  the  other  agent.  Thus  a  sufficient  statistic 
for  each  agent  is  the  local  state  x^  and  the  local  conditional  probability 
distribution  on  x,  p(x(t)  jY^t)). 


Proof:  If  (2.9)  holds,  then  (2.8)  becomes 


T  T 

E  {  E  J  (x(t),t)  +  t  J  (x(t)  ,  u  (t) 
W  V  t-1  t*=l 


W_  +  t  {j,(x(t),  u_(t) 

t*l 


x 1  (t)  ,  t) 
x2(t)  ,  t))  } 


(2.10) 


-B{J  (X)}+  E  {J  (X,0  ,X  )}  +  E  {J  X,U  X  )}  (2.11) 

W  W  V  1  W  V 

by  virtue  of  the  independence  of  and  X^  from  V2  and  W2  implied  by  A2-A6 
and  the  structure  of  clearly  only  affects  the  second  term;  hence  it 

is  chosen  to  minimize 


<r,)  -  E 


{J. (X,U. ,X,)} 


(2.12) 


LO- 


s 

i 


k: 


k 


F 


and  this  is  a  classical,  centralized  imperfect  state  information  problem 
[9].  It  is  well  known  that  the  conditional  state  distribution  is  a 
sufficient  statistic  for  this  problem;  from  the  point  of  view  of  agsnt  1, 
the  state  of  the  process  external  to  it  which  must  be  considered  is 
(x(t),  x^ (t) ) .  By  assumption,  it  knows  x^  (t)  perfectly;  thus  a  sufficient 
statistic  is  x^  (t)  and  the  conditional  distribution  on  x(t).  A  symmetric 
argument  applies  to  agent  2.  q 

Thus  we  are  particularly  interested  in  cases  where  (2.9)  does  not 
hold  -  where  a  spatial  additive  decomposition  of  the  cost  does  not  exist. 

Finally,  one  implication  of  the  above  assumptions  will  be  used 
repeatedly: 

Lemma  3:  A1-A6,  and  the  restriction  on  admissible  I\,  imply  that 

p  (w2  ,v2 ,  y2  ,u2  ,x2  |  W  ,X  ,V  ^  ,XX  ,ux) 

'  “  (2.13) 

=  p(W2,V2,Y2,U2,X2|w) 

Proof :  Decompose  the  first  term  in  (2.13)  using  Bayes'  rule,  then 

invoke  A1-A6  and  the  structure  of  T.  to  get 

l 

p(U2,X2|Y2,W2)p(W2)  p(Y2]v2,X)p(V2)p(x|w)  (2.14) 

and  note  that  W,  and  only  W,  appears  in  the  conditioning  of  (2.14). 

□ 

This  summarizes  the  "spatial  Markovness"  of  the  structure  embodied 
by  A1-A6,  and  particularly  A5.  If  one  agent  knows  the  entire  history  of 
the  driving  noises  for  the  main  system,  then  it  can  reconstruct  the  state 
sequence  (from  2.6)  ,  and  use  this  to  compute  statistics  on  the  random 


variables  of  the  other  agent.  No  other  random  variables  e  sociated  with 
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III .  Static  Problems 

The  static  team  estimation  problem  has  been  understood  for  some  time;  little 
can  be  contributed  beyond  the  existing  literature  [4,5,7,9,10]].  However, 
as  suggested  in  the  introduction,  all  other  problems  under  consideration 
can  be  reduced  to  this  case,  so  it  is  worth  reviewing  to  establish  the 
main  results. 

The  static  team  problem  has  each  agent  making  one  decision  based  on 
one  observation  of  the  underlying  system  state.  (Figure  2  shows  the 
causality  relations) .  The  applicable  result  is: 

Theorem  1:  For  static  teams,  the  local  conditional  state  distribu¬ 
tion  is  a  sufficient  statistic  for  the  decision  rules. 


(3.4) 


Solution  Structure:  Static  Case 
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Since  does  not  impact  J,  its  expectation  may  be  dropped.  The  quantity 

E  {j(x (1)  ,u  ,u  (1) ) |w,y  }  is  independent  of  y  by  lemma  3  and  the  nature  of 
V  1  *  1  L 

w  2 

E.  Thus  it  is  only  a  function  of  w(0)  and  u^,  and  can  be  precomputed  from 
call  it  J^fwfO),  u^) .  Then 


Y.fyj)  =  arg  min  E 

..  u  J-  ■** 


(3.5) 


and  clearly  p(w(0)|y^)  is  a  sufficient  statistic  for  evaluating  this. 

□ 

The  above  proof  exploits  the  necessary  conditions  generated  by 
person-by-person-optimality  (PBPO)  criterion  [4],  by  assuming  and  deriving 
properties  of  Y  which  must  hold  for  any  f  including  the  optimal  one. 

One  must  be  wary  of  using  (3.5)  to  solve  for  y  as  it  is  only  a  necessary 
condition;  here,  we  have  used  it  only  to  characterize  structural  properties 
of  the  Yj_* 

-*■  -*■  n 

Example:  Suppose  w{0)  =  x  6  1R  is  a  vector  Gaussian  random  variable. 

Pi  P2 

that  vx  €  3R  and  v^  6  ffi  are  independent  Gaussian  random  variables,  and 


y .  =  H .  x  +  v . 
Jt  —i  l 


(3.6) 


are  linear  observations.  Then  the  solution  to  this  linear,  Gaussian  (LG) 
problem  is  characterized  by 


Corollary  la;  The  conditional  mean  E(x|y^}  is  a  sufficient  statistic  for 
the  static  LG  problem. 


Proof:  By  elementary  properties  of  Gaussian  random  variables,  the 
sufficient  statistic  p(w(0)|yj  is  also  Gaussian,  and  completely  defined 
by  its  covariance  and  mean.  Its  covariance  matrix  is  independent  of  y^. 


Thus  the  conditional  mean  is  sufficient  for  determining  p(x|y  J  ,  and  hence 

V  □ 

Note  that  this  makes  no  special  assumptions  on  the  structure  of  the 

-V  “f  -fr¬ 
eest  J.  However,  when  J  is  jointly  quadratic  in  x,  u^,  and  can  be 

found  exactly.  Let 


l  -vr  -+T 
J(x,  u  ,  u2)  =  -j lx  Ml  u2] 


Soo  ^01  ^02 

“x  ' 

&10  &11  ^12 

U1 

_&20  &21  ^22 

,U2_ 

m. 


(3.7) 


where  u^  €  3R  1  and  the  compatibly  partitioned  matrix  £  is  symmetric  and 

T 

positive  definite.  (Note  £21  =  Q12  =  £  when  the  cost  is  spatially 
separable  and  Lemma  2  applies.) 


Theorem  (Radner) :  The  optimal  decision  rules  for  the  static  LQG  problem 
are  unique  and  given  by 

u.  =  -G.  E{x| y. } 
i—i  i 

where 

-1  =  ^ll"  ^lAAl1  [Q10  ’  ^12Q22^20] 

and  symmetrically  for  G2> 

7Por  reference,  E{x|y.}  =  e{x}  +  PHT,IH  P  HT  +  R  ]  1(y  -  H.e{x}),  where 

1  1  i^.  i  i  *  1 

P  is  the  (unconditional)  covariance  on  x,  Vi  is  zero  mean,  and  Rj>  the 

covariance  matrix  v,  . 


Proof;  See  [9  ]  or  [5].  Note  Q>0  implies  >  0,  and  >  an& 
and  G 2  are  well  defined. 

Thus  the  static  case,  as  well  as  the  special  case  of  Lemma  2,  results 
in  the  conditional  state  distribution  being  a  sufficient  statistic. 


IV.  Sequential  Problems 

Now  we  move  to  a  slightly  more  complex  case,  where  the  system  state 
evolves  deterministically  (w(l)  =  w(2)  =  ...  =  w(T-l)  =  constant),  but  the 
agents  obtain  observations  and  make  decisions  in  a  sequence  over  time.  A 
sufficient  statistic  must  not  only  supply  the  requisite  information  for 
the  current  decision,  but  also  must  be  able  to  be  combined  with  future 
observations  to  generate  future  sufficient  statistics. 

First,  the  problem  with  a  dynamic,  but  deterministic,  evolution  of 
the  state  x(t)  can  be  reformulated  with  a  time-varying  observation  structure 
related  to  a  fixed  state  -  the  initial  state.  If 

x(t+l)  -  f  (x(t)  ,t) 

(4.1) 

y  (t)  *  h(x(t) ,  v. (t) ,  t) 
i  1 

then  defining 

F(x(0)  ,0)  -  x  (0) 

F(x(0)  ,t)  =  f  (F  (x  (0)  ,t-l)  ,  t)  (4.2) 

H^xtO.t)  -  h^FfxtO)  ,t)  ,  v.(t),  t) 

is  a  completely  equivalent  model  relating  each  y^(t)  to  the  initial  state 
x(0)  m  w(0).  Note  that  if  a  distribution  on  w(0)  is  known,  an  equivalent 
distribution  on  x(t)  can  be  found  by  a  straightforward  change  of  variables, 
but  the  reverse  is  true  only  if  F(*,t)  is  invertible  (i.e.  one-to-one)  (and 
here  lies  a  clue  to  the  answer  of  the  question  posed  in  the  introduction) . 

The  remainder  of  this  section  will  thus  focus  on  w(0)  as  the  only  interesting 
system  variable. 


That  the  sequential  case  is  closely  related  to  the  static  problem  can 
be  seen  by  considering  the  special  case  where  the  local  states  x.  influence 
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neither  the  observations  nor  the  cost. 


Corollary  l.b:  If  Jfx.u^u^x^x^  =  Jfx.u^v^)  and  p (yj  x^ .x^  «=  pfy^x.vj  , 
then  the  local  conditional  initial  state  distribution  p(w(0) |l^ (t) )  is  a 
sufficient  statistic  for 


Proof: 

T 

min  J(r  r  )  =  min  Z  E  {j  (w (0)  ,Y,  (I,  (t)  ,t)  ,  y  (I_  (t)  ,t)  ,t)  } 

r  ,r  1  2  r  r  t=i  w  11  22 

^  z  V  .V 

1'  2  (4.3) 

T 

=  £  min  2  {J(w(0)  ,  y  (I  (t)  ,t)  ,  Y  (I  (t)  ,  t)  ,  t) 

t-1  Y,,Y2  W 

V,  .V,  ,, 


because  each  choice  of  a  decision  rule  for  a  particular  time  t  affects 

exactly  one  term  in  the  sum.  The  choices  of  ,t)  can  be  separated,  and 

thus  the  sum  and  minimization  interchanged.  From  theorem  1, 

p(w(0) |l^ (t) )  =  p(w(0) | Y± (t) )  is  a  sufficient  statistic  for  Y^  solving 

the  inner  (static)  team  optimization  in  4.4.  Finally,  the  sufficient 

statistic  for  YM*  »  t+1)  can  be  generated  from  that  for  Y^(*»t)  and  front 

y^  (t+1)  via  Bayes1  Theorem: 

p(yi(t+l)  Jw(0) )  p  (w  (0)  Iv^t)) 

p(y.(t+l)  |y.  (t))  <4*5) 

where  the  denominator  is  directly  computable  from  the  terms  in  the  numerator 
(via  summation  or  integration  over  w(0)). 

□ 

This  argument  does  not  readily  generalize  to  the  case  where  local 
dynamics  are  present,  as  the  choice  of  Y^('*t)  influences  not  only  the  cost 
at  time  t,  but  also  the  cost  at  future  times  through  its  effect  on  (which 


p(w(0)|  y.  (t+1)) 
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may  appear  directly  in  the  cost,  or  which  influences  future  observations 
and  hence  future  costs.  )  However,  the  propagation  of  this  effect 
of  each  choice  of  Y^(*,t)  is  causal,  and  a  nested  recursion  can  be  found 
which,  while  not  a  complete  separation  as  in  corollary  l.b,  provides  enough 
structure  to  deduce  a  sufficient  statistic. 

Theorem  2:  For  the  general  sequential  problem,  where  x(t)  =  x(0)  =  w, 
a  sufficient  statistic  for  each  decision  rule  ,t)  is  the  local  state 

x^(t)  combined  with  the  local  conditional  distribution  on  w. 

Proof:  By  reverse  induction. 

Basis:  t=T.  The  only  term  in  the  cost  involving  y. (I  (T)  ,T)  is 

-  i  ^ 

Jtw.u^tT)  ,u2<T)  .XjCT)  ,x2(T))  .  Each  y^(I^(T),T)  may  be  chosen  to  optimize 
this  term  alone.  As  in  Theorem  1,  for  any  Tj,  y^d^TKT)  * 


arg  min  E  {Jtw,^,^  ,x1*x2)  1 1 1  <T)  } 
U1  WV2 


(4.6) 


arg  min  E  {e  {jfw,^,^  ,Xj  ,x2>|  w  ,1^  (T)  }  1 11  (t)  }  (4.7) 


W  V2 


arg  min  E  {e  (J(w,u1»u2,x1 ,x2) |w} | (T) } 


U1  *  v2 


(4.8) 


by 


3.  Defining 


A 

Jl(w,ul'xl)  =  EtJ (w,u,u2 ,Xj ,x2> | w} 


(4.9) 


(4.10) 


it  is  easy  to  see  that  can  be  chosen  to  minimize 

E  {j^w, u^,^)  (i^T)  } 
w 

if  p (w| (T) )  and  (T)  are  known.  Hence  the  theorem  holds  at  time  T. 

Induction:  Define  z^(t)  -  p(w|l^(t))  for  convenience.  Assume 
Yi(Zi(x),x)  are  fixed  for  all  x  =  t+1,  T;  by  the  induction  hypothesis, 
such  Yi  exist  which  are  equivalent  to  optimal  T) .  Define 

L(Zl(t+l),  z2(t+l),  x1(t+l),  x2(t+l),  w,  t+1)  * 

T 

E  E  J(w,x.  ,x  ,u  ,u  ,x)|  I  (t)  ,  I  (t)  ,  w)  (4.11) 

Wi(t:T-l)  x=t+l  *  1  *  1 

V..  (t+1 :  T) 

where  the  expectation  is  over  the  primitive  random  variables  w^(x) , 

X  *  t,. . .  ,  T-l ,  and  v^x)  ,  X  *  t+1,. . .  ,T,  i  =  1,2.  Note  that  this  is 
indeed  just  a  function  of  z z2,  x^  x2,  and  w  since:  the  cost  at  each 
time  is  a  function  of  decisions,  states,  and  w;  the  states  are  functions 
of  decisions,  prior  states,  and  independent  noise;  the  decisions  are 
functions  of  the  statistics  z^;  the  z^  are  functions  of  w  and  independent 
noise.  Thus  all  terms  in  the  expectation  are,  by  virtue  of  A1-A6  and  the 
induction  hypothesis,  dependent  upon  I^(t),  I2(t),  and  w  only  through 
x^tt) ,  x2(t) ,  z1(t) ,  z 2 (t)  ,  and  w  -  precisely  the  arguments  of  L. 

Now,  consider  the  choice  of  y^"  *t)  ,  again  with  I*  and 
Y^(*»l)»  T=t+1,...,  T,  fixed.  By  the  now  familiar  PBPO  arguments, 
Y^d^(t),t)  seeks  to  minimize 
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a 

■ 

a 

a 

i 

» • 

■ 

»  , 

>  ^ 

4 


i 

E  Z  J(w,x  (T)  ,X  (T)  ,U  (T)  ,U  (T)  )  jl.  (t) 

w  Vx  JT*t  12  12  1 


«e(e  {j  (w,x.  (t)  ,x  (t)  ,u1  (t)  ,u  (t) )  +  (4.12) 

w  W2(t-1)  12  12 

v2(t) 

E  i  Z  J(w,x  (t),  x  (t)  ,  u  (T)  ,  u_  (t)  1 1,  (t)  ,w ,W„  (t-1)  ,V_ 
Wi(t:T-l)  (T=t+1  11  2 

Vi(t+1:T) 

|w,I1(t)  }|l1(t)  } 


The  inner  expectation  is  L  ,  since  ,W2  (t-1)  and  V(t)  determine  l2<t).  By 
Lemma  3,  the  middle  expectation  is  independent  of  l^ft),  since  w  is 
included  in  the  conditioning,  and  we  may  define 

3l(w,Xi(t),Zi<t),ui)  =  (4.13) 

E  {j(w,x.  (t),  x0  (t)  ,u  (t)  ,u  (t) )  +  L  (z  (t+1)  ,z  (t+1)  , 

W2  (t-1)  12  12  12 

V2  (t) 

,  x2  (t+1)  ,  w,t+l)j  w} 


The  outer  expectation  and  minimization  in  (4.12)  becomes 

YldlftJjt)  =  arg  min  e{5  (w,x1  (t)  ,z  (t)  .u^l  Ij  (t)  }  (4.14) 

U  W 

1 

for  which  it  is  seen  that  knowledge  of  p(w|l^(t)) ,  x^  (t)  and  z^(t)  *  p(w|  Il(t)) 

are  sufficient  to  determine  Y,(*,t). 

1  Q 

This  result  follows  directly  from  the  causal  structure  of  the  problem. 


The  local  state  distribution,  by  Lemma  3,  is  all  that  can,  and  should,  be 
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s\xnmarized  from  i^tt)  to  predict  the  entire  behavior  of  agent  2  both  at 
time  t  and  in  the  future  with  all  of  agent  2's  decision  rules  fixed*  This 
allows  agent  1  to  predict  the  impact  of  2's  decisions  on  the  cost  as  well 
as  if  I^(t)  were  all  available.  z^(t)  =  p(w|l^(t).)  is  all  that  is 
necessary  to  minimize  the  contribution  of  u^ (t)  to  the  current  cost 
term,  as  well  as  to  link  I^(t)  to  future  decisions. 

The  resulting  solution  architecture  is  shown  in  Figure  3.  The  local 
estimators  are  ordinary  Bayesian  estimators,  each  with  a  structure 
completely  determined  by  the  sensor  to  which  it  is  attached.  Feedback 
of  (t)  is  required  to  account  for  its  impact  on  the  observation.  The 
agents  now  implement  u^ (t)  =  y^(z^(t) ,x^  (t) ,  t)  as  memory less  decision 
rules. 

The  structure  of  the  proof  of  theorem  2,  plus  the  visualization  of 
Figure  3  which  highlights  the  fact  that  the  statistics  z^(t)  evolve  as 
stochastic  dynamic  systems  with  inputs  w  and  (t) ,  and  driving  noise  v^ (t) 
strongly  suggests  a  recursive  solution  technique,  similar  to  dynamic 
programming  [8] ,  where  L  plays  the  role  of  a  cost-to-go  function  an 
(Zi,z2,xi,Vw)  that  of  the  state. 

This  is  not  quite  possible.  From  figure  3,  and  the  whiteness  of 
(v^ft),  v2(t))  ,  it  is  clear  that  the  entire  system  is  Markov  with  a  state 
of  (z^,z2,x^,x2,w) .  For  a  particular  choice  of  Y^t-rt),  Y2(-,t),  this 
implies  that  p(z^(t+l),  z2(t+l),  x^Ct+l),  x2(t+l),  w)  can  be  completely 
determined  f ran  pfz^tt),  z2  (t)  ,  (t) ,  x^{t) ,  w)  and  the  Y^*  However,  L 

does  not  serve  to  summarize  all  costs,  other  than  current  ones,  necessary 
to  choose  y^  and  y2*  the  second  step  of  the  proof  (4.13)  required  the 


Figure  3.  Optimal  Solution  Structure:  Sequential  Observations 


additional  knowledge  of  y2 ( • ,0) , . . .  ,Y2  ( • #t)  in  order  to  exploit  PBPO 
conditions  for  Y^(*»t).  Thus  the  solution  technique  resulting  from 
(4.13)  would  only  yield  expressions  for  Y^(-*t)  in  terms  of  previous 
choices  of  y2(-*t)  “  and  not  separate  future  from  past  as  in  centralized 
dynamic  programming.  (The  reason  is  that  the  choice  of  Y^  depends  on 
the  P  (u2  (t)  |<o) ,  which  involves  the  distribution  on  Z2  (t)  ,  wnicn  in  turn 
is  determined  by  the  prior  decision  rules  of  agent  2.) 

However,  one  can  get  a  dynamic  programming  algorithm  by  exploiting 
the  joint  Markovian  structure. 

Corollary  2. a;  The  optimal  decision  rules  for  a  sequential  problem  may 
be  determined  from  a  recursion  on  the  joint  distribution  p (z^ ,x^ ,x^ ,w) : 

V(p(zlfz2  ,x1,x2,w)  ,T)  =  minE  J  (w.x^x^u^u^T)  } 

Yx(*  ,T) ,  (4.15) 

Y  ( *  »T) 

2 

and 

A 

V(p(z  ,z2>x  ,x  ,w)  ,  t)  = 

min  E  {j(w,x  ,x  ,z  , z  , t ) } 

YjC-.t)  1  2  1  * 

+  V (p  (z  (t+1)  ,  Z  (t+1) ,Z. (t+1) ,  X  (t+1) ,  x  (t+1) ,w) ) ,t+l) 
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where  each  expectation  is  over  all  the  random  variables  inside  it, 
and  the  probability  measure  used  to  evaluate  it  is  that  appearing  as  the 

A 

argument  to  V.  Each  y^(*  »t)  is  restricted  to  being  a  function  of 
only. 

Proof:  The  Markovian  nature  of  ;->(z  ,z  ,x  ,x2»w)  implies  that  the 
joint  distribution  p(*, •,*,*,•)  evolves  in  a  purely  recursive  manner. 

The  deterministic  dynamics  of  this  distribution  depend  only  on  memoryless 
control  laws  of  the  form  specified,  independent  noise  distributions,  and 
local  state  dynamics;  hence  it  can  serve  as  a  dynamic  programming  state 
under  the  conditions  specified  for  the  y  O 

This  corollary  displays  the  strengths  and  weaknesses  of  knowing 

sufficient  statistics  z^  and  z2 .  A  decentralized  decision  problem  has 
been  reduced  to  a  deterministic  dynamic  programming  problem,  from  which 
conclusions  as  to  the  behavior  of  the  system  under  optimal  decision 
policies  may  be  derived.  The  price  paid  for  this  is  that  of  dimensionality  - 
not  only  are  the  z^  of  higher  dimension  than  the  original  states,  but  the 
dynamic  programming  is  over  a  probability  distribution  including  the 
z^l  Thus,  while  an  interesting  structurally  ,  this  result  is  unlikely 
to  lead  to  implementable  solution  techniques  because  the  double  "curse  of 
dimensionality" . 

Example :  Consider  the  decentralized  optimal  stopping  problem,  moti¬ 
vated  by  [11]  and  discussed  in  [12].  The  initial  state  is  a  binary  hypothesis, 
with  known  prior  distribution  {p(w=H  ) ,  p(w=H  )}.  Each  local 

n  D 

state  x^  is  one  of  three  discrete  states:  continuing  (C^ ,  stopped  with 

H  declared  (A.),  or  stopped  with  H  declared  (B.).  If  the  local  state  is 
AX  B  1 


CL  ,  observations  are  statistically  related  to  w;  otherwise,  they  are  only 

noise  v..  Decisions  are  available  which  allow  the  local  state  C.  to 
1  i 

be  changed  to  any  local  state,  but  A^  and  are  trapping  states.  Initially 
x. (0)  =  (L  .  Local  error  penalties  are  assessed  at  the  terminal  time  T 
between  the  local  state  and  true  hypotheses  which  penalize  any  event 
where  the  local  state  does  not  match  the  true  state  w.  In  addition, 
local  data  collecting  costs  are  incurred  each  time  the  local  state  is 
C. .  Finally,  to  induce  coordination,  assume  that  an  additional  cost  is 
incurred  whenever  both  local  states  are  CL  ,  thus  motivating  decision 
behavior  where  one  agent  stops  quickly  but  the  other  may  continue. 
Application  of  theorem  2  yields  the  following  characterization  of  the 
solution. 

Corollary  2.b;  A  sufficient  statistic  for  the  decentralized 

optimal  stopping  problem  is  the  local  state  xi  G  (A^,B^,C^)  and  the 

local  conditional  probability  of  Hft,  z^ (t)  =  p(HA|Y^(t)).  The  optimal 

decision  rule  when  x.  <*  C.  is  a  sequential  probability  ratio  test  (SPRT)  on 

1  12 

z^(t)with  some  upper  and  lower  thresholds  n^(t)  and  nj_(t),  respectively. 

Proof:  z.(t)  is  sufficient  to  determine  the  entire  conditional  dis- 
-  i 

tribution,  since  w  is  binary.  No  effective  decision  can  be  made  unless 
x^  *  CL.  It  is  straightforward,  but  tedious,  to  show  that  for  the  cost 
structure  given,  any  choice  of  ,t)  leads  J^(w,x^  =  C^,z^(t),u^)  to 
be  concave  in  z^  when  u^  =  continue,  and  a  constant  when  u^  =  stop  and 
declare  A  or  B.  This  implies  the  SPRT  structure.  Thus  the  entire  solution 
is  characterized  by  the  4(T-1)  parameters  {T\?(t),  (t)  j  i*l  ,2}  t=*l , . . .  ,T-l} . 


□ 


Thus  the  decision  rules  of  the  decentralized  variation  of  the  optimal 
stopping  problem  share  the  structure  of  those  of  the  centralized  solution, 
but  with  different  parametric  values.  Theorem  2  ensures  that  this  is  an 
example  of  a  general  phenomenon;  since  (x^,z^)  is  a  sufficient  statistic 
in  both  the  centralized  (i=l)  or  decentralized  (i=l,2)  cases,  the  basic 
decision  structures  are  identical. 

Before  concluding  this  section,  the  main  result  of  this  section  can 
be  related  to  the  original  question  posed  in  the  production  by: 

Corollary  2c:  If  the  system  dynamics  are  reversible  (in  4.1,  f(»,t) 

is  one-to-one)  in  a  deterministic,  dynamic  problem,  then  x^(t)  and 
z^(t)  =  p(x  (t) |Yi(t) )  is  a  sufficient  statistic  for  each  agent. 

Proof:  Under  these  conditions,  p(x(t) [y  (t) )  completely  specifies 
p(x(0) |y . (t) ) ,  which  is  sufficient  by  theorem  2.  q 
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V.  Dynamic  Problems 

Consider  now  the  general  case  of  the  problem  posed  in  Section  II  - 
x(t)  evolves  as  an  autonomous  Markov  process  with  white  driving  noise 
w(t) ,  and  each  agent  receives  noisy  observations  of  the  state  which  depend 
on  a  local  state.  This  structure  is  characteristic  of  many  search  and 
surveillance  problems,  where  x(t)  models  the  trajectory  of  an  object,  and 
the  two  agents  are  either  searching  for,  or  just  tracking,  the  object. 

The  local  states  model  either  the  trajectory  of  the  search  platforms,  or 
the  dynamics  of  the  sensor  (e.g.  pointing  a  radar). 

Following  the  general  procedure  of  reducing  a  partially  nested  team 
problem  to  an  equivalent  static  one,  some  immediate  conclusions  can  be  drawn 
about  sufficient  statistics  in  this  case. 

Theorem  3:  Under  the  basic  assumptions  A1-A6,  a  sufficient  statistic 
for  each  agent  in  a  dynamic  estimation  problem  is  the  local  state  x^(t) 
in  conjunction  with  the  local  conditional  distribution  p(W  (t)  |  Y^  (t) )  on 
the  driving  noise  sequence . 

Proof:  By  replacing  each  w  in  the  proof  of  theorem  2  with  W(t) ,  it  is 
easy  to  show  that  p(W (T-lJy^ ( t) )  is  a  sufficient  summary  of  past  observations 
(since  W(T-l)  can  be  viewed  as  an  initial,  static,  state  which  influences  the 
dynamics  in  a  special  way).  However,  by  A3,  p  (W(t;T-l)  |  Y^  (t) )  *  p(W(t:T-l)) 
since  w  (t) . . .w(T-l)  is  white;  hence  p (W  (T-l)  |  (t) )  can  be  reconstructed  from 

p(W(t)  |  Y.  (t) )  and  the  prior  information.  q 

The  result  is  constructive,  but  not  as  helpful  computationally  as  was 
Theorem  2.  Here  the  sufficient  statistic  increases  in  dimension  with  time  - 
a  fact  which  compounds  the  dimensionality  problem  encountered  in  corollary  2. b. 
(The  sufficient  statistic  could  equally  well  be  taken  to  be 
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p(X(t)  |Y^ (t) )  and  x^(t)  due  to  the  assumption  that  w  influences  y^  and 
future  behavior  only  through  x,  and  the  same  problem  would  exist) .  However, 
no  claim  is  made  that  this  is  a  minimal  sufficient  statistic;  it  is 
possible  that  other  sufficient  statistics  of  fixed  dimension  can  be 
found. 


Example ;  Suppose  the  main  system  is  linear 


x  (t+1)  =  FCt)x(t)  +  w(t)  e  IR 


(5.1) 


with  w(t)  zero-mean  and  Gaussian.  Local  observations  are  linear 


yi(t)  =  H.(t)x(t)  +  v^t)  e  TR  1 


(5.2) 


with  v.  (t)  zero-mean  and  Gaussian.  Assume  the  local  states  are  irrelevent,  so 
*  +  m. 

each  agent  seeks  to  produce  directly  a  local  "estimate"  u^(t)  G  ]R  to 

minimize  a  quadratic  cost  function  as  in  (3.7).  This  is  the  generalization 

to  the  dynamic  LQG  estimation  problem  of  Radner's  theorem. 


Corollary  3a:  For  the  decentralized  LQG  estimation  problem  the 


local  conditional  mean  on  the  current  state  is  a  sufficient  statistic. 


8 


Proof .  From  theorem  3,  p(W(t)  |y^  (t))  is  a  sufficient  statistic.  By 
elementary  properties  of  Gaussian  random  variables  under  linear  observations, 
this  distribution  is  Gaussian  specified  by  a  covariance  independent  of 
Yi(t)  and  conditional  mean  E{w(t) |y^ (t) }.  By  the  same  argument  used  in 


g 

’Superficially,  this  seems  to  contradict  the  results  of  (13],  where  a 
sufficient  statistic  was  found  which  increased  in  dimension  with  the 
nunber  of  agents.  However,  that  work  treated  correlated  observation 
noise  directly;  if  Lemma  1  were  used  to  transform  that  problem  to  this 
setting,  then  it  would  result  here  in  a  new  state  x  of  dimension  dependent 
upon  the  nunber  of  agents,  and  the  results  are  compatible. 
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corollary  l.b,  each  (t)  is  chosen  to  minimize  the  individual  term 


*  -r  -r  -►  . 

ElJ (x  (t) ,  { t )  ,  u?(t))}.  Since  J  is  quadratic,  and  x(t)  is  a  linear 

combination  o£  the  elements  of  W(t) ,  this  is  a  static  LQG  team  probles 


and  Radner 

's  theorem 

applies 

(with  state 

and  W  (t)  , 

this 

cost  ,is 

+T+T+T, 

fW  u _  u2J 

"a 

F 

0  0 

T 

^00 
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^02 
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I  0 

^10 

S11 
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where 

F  =  [<t>(t. 

0)  :<D(t, 

i) :. 

.:  $ 

(t,t-l)} 

0 

I 

0 


and  $ (t ,t)  is  the  nxn  system  matrix 

<Mt,t)  =1  ®(t,T)  *  $(t,T+l)F(T) . 

By  Radner's  theorem,  the  optimal  decision  rule  is 


Ui  =  -G1(t)E{w(t)|  (t)  } 


where 


W 

-► 

u. 


u. 


-l(t) 


^11~®12^22  2211  [2l0^  “  2^2^  22  ^20— ^ 


(5.3) 


(5.4) 


(5.5) 


(5.6) 


(5.7) 


The  decision  rule  is  then 


Uj*  -  G*  e(f  W(t)  |  Y^t)} 

-  -  G*  E{x(t) |Y  (t) }  (5.8) 

11  O 

This  implies  that  for  the  dynamic  LQG  estimation  problem,  the  local 
Kalman  filter  estimate  is  indeed  the  sufficient  statistic.  If  care  is 
taken  to  use  Lemma  1  to  define  x(t)  so  that  the  spatial  Markovian  property 
holds,  then  an  elegant  result  emerges  which  leads  to  a  computationally 
feasible  solution. 

Another  interesting  point  is  that  the  decision  rule  y^(.,t) ,  as 
specified  by  G?,  is  identical  to  the  rule  that  would  have  been  used  in 
the  static  case  if  x(t)  were  generated  alone  at  time  t,  with  no  prior 
dynamics,  and  each  agent  had  received  an  observation  y^(t)  producing 
E{x(t)  |y(t)  }  as  the  conditional  mean.  Not  only  does  the  static  nature 
of  the  cost  yield  separation  in  time  of  the  computation  of  the  decision 
rules,  but  the  fact  that  x(t)  arose  as  part  of  a  dynamic  process  does 
not  matter  either. 

Thus  far,  several  problems  have  been  identified  for  which  the  local 
state  and  local  conditional  distribution  are  sufficient  statistics.  In  the 
general  case,  at  least  so  far,  only  the  sufficiency  of  p(W(t) | (t) )  has 
been  shown.  Is  that  as  far  as  we  can  go,  or  is  the  LQG  problem  indicative 
of  the  fact  that  one  more  step  can  be  taken  to  show  that  p(x(t) |y^ (t) ) 
is  sufficient  in  general? 
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VI .  When  the  State  is  Not  Enough 

Whether  or  not  p(W(t) jy^(t))  is  as  far  as  one  can  go  is  best  addressed 
by  example.  Essentially,  W(t)  includes  information  on  the  entire  past 
trajectory  of  the  process,  and  we  are  interested  in  determining  if  and 
when  the  current  state  x(t)  is  enough.  Since  the  system  is  Markov,  and 
in  light  of  the  results  thus  far,  one  might  conjecture  that  it  is. 

Consider  a  simple,  discrete  state  example.  x(t)  evolves  as  a  Markov 
chain,  depicted  in  Figure  4.  The  states  can  be  interpreted  as 

N:  normal  state 
W:  transient  warning  state 
E:  short-lived  emergency  state 

Agent  1  has  perfect  state  information;  agent  2  cannot  distinguish  between 
N  and  E,  but  observes  each  W  (and  thus  may  infer  the  succeeding  E) . 

Each  agent  makes  one  of  two  decisions  at  each  time. 

u^  *  0  -  the  system  is  in  N  or  W 
u^  =  1  -  the  system  is  in  the  E  state. 

Penalties  are  assessed  as  follows  (and  added  if  several  apply)  * 

(a)  10,000  whenever  u^ (t)  t  u2(t) 

(b)  100  whenever  u^t)  *  1  and  x(t)  e  {n,w} 

(c)  1  whenever  uMt)  =  0  and  x(t)  =  E. 

Thus  the  agents  seek  to  (a)  agree,  (b)  not  generate  false  alarms,  and 


(c)  report  emergencies. 


A  weaker  conjecture  than  the  one  that  p(x (t) | (t) )  is  sufficient 
is  the  following. 

Conjecture;  If  a  decision  agent  has  perfect  state  information  in  a 
dynamic,  decentralized  estimation  problem,  then  its  optimal  decision 
rule  is  a  function  of  the  current  observation  only;  i.e. ,  of  the 
current  state. 

This  is  certainly  true  in  the  single  agent  case.  Consider 
its  consequencies  in  the  context  of  this  example. 

(1)  Cost  (a)  dominates,  as  its  magnitude  relative  to  the  other 
costs  is  larger  than  any  ratio  of  probabilities.  Clearly  a  decision 
rule  exists  which  never  incurrs  penalty  (a) ,  such  as  u^  (t)  *  u^ (t)  «  0 
regardless  of  the  data. 

(2)  Cost  (b)  is  next  most  significant,  and  the  same  decision  rule 
mentioned  above  also  guarantees  that  (b)  will  never  be  incurred.  Thus 
an  upper  bound  on  the  average  cost  per  stage  is  5/19  -  the  steady  state 
probability  that  E  is  occupied. 

(3)  By  the  conjecture  and  (2) ,  agent  1  must  choose  u^=0  whenever 
it  sees  x  £  {n,W}. 

(4)  There  will  be  times,  long  after  the  most  recent  W,  where  agent 
2  is  not  certain  whether  the  state  is  N  or  E.  By  (3)  and  (1) ,  it  must 
choose  u2=0  ^ese  cases. 

(5)  There  is  a  possibility  that  the  system  is  in  state  E  in  cases 
such  as  (4).  Agent  2  will  be  choosing  u2=0,  so  by  (1)  agent  1  must 


also  choose  u  =0. 
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(6)  By  the  conjecture ,  since  agent  1  must  choose  u^=0  when  x(t)=E 
in  soae  cases,  it  must  do  so  in  all  cases.  Thus,  if  the  conjecture  holds, 
the  decision  rule  defined  in  (2)  must  be  optimal.  Q 

It  is  not.  By  modifying  the  rule  so  that  u^=u2=l  every  time  E  is  entered 

immediately  after  a  W,  all  criteria  can  he  satisfied.  Since  this  is  a  recurrent 
event,  detectable  by  bcth  agents,  and  the  penalty  (c)  is  not  incurred  under 

the  modified  rule  but  is  in  the  original,  the  modified  rule  must  be  strictly 
better  in  terms  of  average  cost.  However,  this  is  achieved  only  if  agent  1 
remembers  whether  E  was  entered  from  W  or  {n,e}  -  and  this  is  more  than 
just  the  current  state.  Thus  there  are  cases  where  p(x(t)|Y(t))  is  not 
enough. 

The  curious  thing  about  this  example  is  that  it  is  possible  to 
determine  exactly  what  is  a  sufficient  statistic,  and  that  statistic 
is  finite.  Consider  agent  2;  a  Bayesian  state  estimator  for  it  can  be 
in  one  of  three  states,  z2<t) ,  representing  either  E,  or  W,with 
probability  1,  or  the  distribution  (p(N)  =  .8,  p(E)  =  .2,  and  p(W)  =  0}. 

(Mote  this  latter  state  is  trapping  until  the  next  W  is  observed  since, 
for  this  choice  of  transition  probabilities,  the  distribution  on  {n,e} 
achieves  steady  state  after  one  time  step).  Agent  1  can  infer  2’s  ob¬ 
servations  from  the  original  state  trajectory,  and  hence  knows  its 
estimator  state  z2<t).  Viewing  the  original  system  and  2's  estimator 
together  as  a  composite,  discrete  state  system,  agent  1  sees  a  system 
which  can  be  in  one  of  four  states  (Figure  5.1).  Thus  agent  l's  estimator 
of  the  combination  tracks  both  the  actual  state  (upper  section  of  each  box)  , 
but  also  the  state  of  agent  2  (lower  section) . 


6 


Similarly,  agent  2  can  view  this  extended  estimator  of  agent  1  in 

combination  with  the  system,  and  construct  a  new  joint  estimator.  Surprizingly , 

it  still  has  three  states  (Figure  5.2)  ,  since  states  3  and  4  of  agent  l's 

estimator  are  not  distinguishable  to  agent  2.  Thus  finite  estimators 

with  states  z^  (t)  for  each  agent  can  be  found .  When  used  to  augment 

the  system  state  to  (x(t),  z^  (t) ) .  these  produce  a  composite  system  the  Bayes' 

estimator  of  which  is  the  other  agent's  estimator  with  state  z .  (t) .  (More- 

J 

over,  in  this  case,  both  z^  and  z2  are  finite.)  Note  that  this  is  true 
for  any  cost  function,  not  just  the  example  cost  above;  note  also  that  the  only 
change  from  the  conqputation  of  p(x(t)|  Y  (t))  has  been  the  addition  of 
a  state  to  agent  l's  estimator  representing  the  special  case  where  E  is 

entered  from  W. 

The  conclusions  to  be  drawn  are  that  examples  exist  where  p(x(t)|  Y^ (t) ) 
is  not  a  sufficient  statistic,  but  that  other  sufficient  statistics  do 
exist.  This  example  is  a  bit  contrived  as  the  transition  probabilities 
between  E  and  N  were  chosen  so  that  agent  2's  estimator  was  finite- 
normally  it  would  be  countably  infinite.  However,  there  are  the  suggestions 
of  a  procedure  for  generating  sufficient  statistics  which  do  apply,  but 
these  must  wait  for  a  sequel  [14]. 
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VII .  Conclusions 

Theorem  3  is  the  principal  result  of  this  work.  In  any  decentralized 
problem  with  the  structure  specified  in  section  II,  each  agent  must  estimate 
at  most  the  history  of  system  driving  noises,  which  is  equivalent  to  the 
state  trajectory .  The  intuition  behined  this  is  demonstrated 
by  the  example  in  section  VI  -  the  past  state  sequence  provides  informa¬ 
tion  about  the  past  information  received  by  other  agents,  and  hence 
allows  their  decisions  to  be  predicted  more  accurately  than  would  be 
possible  on  the  basis  of  the  current  state  alone. 

However,  the  special  cases  of  section  IV,  and  the  LQG  dynamic  case, 
show  that  the  local  conditional  state  distributions  are  sufficient  for 
a  number  of  interesting  cases  {which  include  local  dynamics) ,  and  this 
reduces  the  choice  of  decision  rules  to  seeking  memory less  maps  from  x^ 
and  into  u^.  If  the  infinite  time  horizon  problem  were  addressed 

via  asymptotic  methods,  then  the  search  would  be  further  reduced  to  that 
of  finding  a  steady- state  decision  rule  of  this  form  (assuming  steady- 
state  exists) . 

The  most  promising  result  for  future  work  is  the  example  of  Section 

VI.  It  illuminates  both  the  nature  of  the  second-guessing  phenomenon 

in  decentralized  estimation,  as  well  as  the  fact  that  the  general 

dynamic  case  is  not  always  infinitely  complex.  It  is  suspected  that  an 

algebraic  theory  of  "decentralized  realizations"  will  be  required  to  find 

structures  for  the  memory  of  each  agent  which,  taken  in  conjuction  with 

the  system  dynamics ,  produces  estimators  for  another  agents  which  satisfy 
the  symmetric  conditions. 
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